Introduction

The shiny package allows to create above some R function a java-script webpage that interact with the R code and displays the results on the webpage, everthing within a web navigator. This is a good way to do some POC (proofs of concept) to validate the interest of our code before developing a real software around it.

Folders and files:

The app.R R script contains the shiny web application, both the server and the ui.

The data provided for the development of this exercise was and .RData file called AirBnB.RData which contains data related to AirBnB listings in Paris.

Exercise:

We were asked to explore and analyse the Paris dataset creating a shiny application and should contain:

  • Relationship between prices and apartment features
  • Number of apartments per owner
  • Renting price per city quarter (“arrondissements”)
  • Visit frequency of the different quarters according to time

Approach

I consider features the following data in the dataset: * Room type * Property type * Neighborhood * Price * Type of owner (host vs superhost) * Location of the listings

According to this features, I developed the analysis of the dataset.

Creating an interactive website with the Shiny package

Prerequisites

The first thing to do is to install the shiny package and its dependencies as well as another package to be able to use very useful tools in R: install.packages("shinyjs", dependencies=TRUE) devtools::install_github("rstudio/EDAWR")

After this, we can load all the packages that are going to be used during the project, if any of this packages was not previously installed it has to be installed following the previous steps:

library(shiny)
library(shinydashboard)
## Warning: package 'shinydashboard' was built under R version 4.0.5
## 
## Attaching package: 'shinydashboard'
## The following object is masked from 'package:graphics':
## 
##     box
library(shinyalert)
## Warning: package 'shinyalert' was built under R version 4.0.5
## 
## Attaching package: 'shinyalert'
## The following object is masked from 'package:shiny':
## 
##     runExample
library(shinycssloaders)
## Warning: package 'shinycssloaders' was built under R version 4.0.5
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(EDAWR)
## 
## Attaching package: 'EDAWR'
## The following object is masked from 'package:dplyr':
## 
##     storms
library(tidyr)
## Warning: package 'tidyr' was built under R version 4.0.5
## 
## Attaching package: 'tidyr'
## The following objects are masked from 'package:EDAWR':
## 
##     population, who
library(stringr)
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.0.3
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.4.0     v purrr   0.3.4
## v tibble  3.1.8     v forcats 0.5.0
## v readr   1.3.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(ggmap)
## i Google's Terms of Service: <https://mapsplatform.google.com>
## i Please cite ggmap if you use it! Use `citation("ggmap")` for details.
library(ggplot2)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggmap':
## 
##     wind
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(DT)
## Warning: package 'DT' was built under R version 4.0.3
## 
## Attaching package: 'DT'
## The following objects are masked from 'package:shiny':
## 
##     dataTableOutput, renderDataTable
library(leaflet)
## Warning: package 'leaflet' was built under R version 4.0.5
library(ggpubr)
## Warning: package 'ggpubr' was built under R version 4.0.5
library(RColorBrewer)
## Warning: package 'RColorBrewer' was built under R version 4.0.5

Preprocessing the dataset

First, load the dataset:

load("AirBnB.RData")

After that, two lists are retrieved with names L and R, we can have a look at the first rows from each list:

head(L)
##         id                           listing_url   scrape_id last_scraped
## 1  4867396  https://www.airbnb.com/rooms/4867396 2.01607e+13   2016-07-03
## 2  7704653  https://www.airbnb.com/rooms/7704653 2.01607e+13   2016-07-04
## 3  2725029  https://www.airbnb.com/rooms/2725029 2.01607e+13   2016-07-04
## 4  9337509  https://www.airbnb.com/rooms/9337509 2.01607e+13   2016-07-03
## 5 12928158 https://www.airbnb.com/rooms/12928158 2.01607e+13   2016-07-04
## 6  5589471  https://www.airbnb.com/rooms/5589471 2.01607e+13   2016-07-04
##                                        name
## 1       Appartement 60m2 Rue Legendre 75017
## 2       Appart au pied de l'arc de triomphe
## 3            Nice appartment in Batignolles
## 4            Charming flat near Batignolles
## 5 Spacious bedroom near the centre of Paris
## 6           Rare, Maison individuelle 200m2
##                                                                                                                                                                                                                                                         summary
## 1   Au 2ème étage d'un bel immeuble joli 2 pièces meublé comprenant: une grande pièce à vivre lumineuse, une chambre, une cuisine, salle de douche et WC séparé. Appartement très calme et lumineux. A proximité de nombreux commerces et transports.
## 2 Nous proposons cette appartement situé en plein coeur de Paris, au pied de l'arc de triomphe. Commerçants, métro, cinéma, vous trouverez à proximité tout ce qu'il faut pour passer quelques jours à Paris en amoureux, entre copains ou en famille ! 
## 3                                                                                                                                   Located in the very charming Batignolles, this cozy and bright two-room appartment will perfectly suit your stay in Paris. 
## 4                                                 Welcome to my apartment ! This a quiet and cosy flat with 2 room (25 sqm2) fully furnished closed to trendy Batignolles area in the heart of the 17th district. (Near Montmartre foothill / Place de Clichy).
## 5                                                                                                                                                                                                   Spacious, quiet and bright room, ideal to explore and enjoy
## 6                                                             Maison individuelle, 200 m2 habitable,rénovée en 2013. Quartier résidentiel, nombreux commerces, restaurants.  Maison familiale, pouvant accueillir 5 adultes et un enfant (1 lit en hauteur).
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          space
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
## 2 L'appartement est composé de : - une grande chambre (environ 15m2) avec un lit simple et d'un matelas d'appoint - une salle de bain avec douche, lave linge/sèche linge - un autre chambre (environ 10m2) avec un lit double (lit gigogne) et une salle de bain dans la chambre (douche) - un grand salon avec une cuisine ouverte (environ 35 m2) - wc séparé Le cuisine est tout équipé : machine nespresso, cocotte-minute, mixeur, lave vaisselle... L'appartement est très lumineux puisqu'il donne sur une avenue large mais calme. Vous trouverez à proximité plein de commercants, de bar pour sortir, de restaurants, des cinémas, des musées. Vous serez au coeur de la ville !  N'hésitez pas à nous contacter pour plus d'information, de photos...
## 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
## 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
## 6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 description
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Au 2ème étage d'un bel immeuble joli 2 pièces meublé comprenant: une grande pièce à vivre lumineuse, une chambre, une cuisine, salle de douche et WC séparé. Appartement très calme et lumineux. A proximité de nombreux commerces et transports.
## 2 Nous proposons cette appartement situé en plein coeur de Paris, au pied de l'arc de triomphe. Commerçants, métro, cinéma, vous trouverez à proximité tout ce qu'il faut pour passer quelques jours à Paris en amoureux, entre copains ou en famille ! L'appartement est composé de : - une grande chambre (environ 15m2) avec un lit simple et d'un matelas d'appoint - une salle de bain avec douche, lave linge/sèche linge - un autre chambre (environ 10m2) avec un lit double (lit gigogne) et une salle de bain dans la chambre (douche) - un grand salon avec une cuisine ouverte (environ 35 m2) - wc séparé Le cuisine est tout équipé : machine nespresso, cocotte-minute, mixeur, lave vaisselle... L'appartement est très lumineux puisqu'il donne sur une avenue large mais calme. Vous trouverez à proximité plein de commercants, de bar pour sortir, de restaurants, des cinémas, des musées. Vous serez au coeur de la ville !  N'hésitez pas à nous contacter pour plus d'information, de photos...
## 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Located in the very charming Batignolles, this cozy and bright two-room appartment will perfectly suit your stay in Paris.
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Welcome to my apartment ! This a quiet and cosy flat with 2 room (25 sqm2) fully furnished closed to trendy Batignolles area in the heart of the 17th district. (Near Montmartre foothill / Place de Clichy).
## 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Spacious, quiet and bright room, ideal to explore and enjoy
## 6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         Maison individuelle, 200 m2 habitable,rénovée en 2013. Quartier résidentiel, nombreux commerces, restaurants.  Maison familiale, pouvant accueillir 5 adultes et un enfant (1 lit en hauteur).
##   experiences_offered neighborhood_overview notes transit access interaction
## 1                none                                                       
## 2                none                                                       
## 3                none                                                       
## 4                none                                                       
## 5                none                                                       
## 6                none                                                       
##   house_rules
## 1            
## 2            
## 3            
## 4            
## 5            
## 6            
##                                                                                   thumbnail_url
## 1                                                                                              
## 2           https://a1.muscache.com/im/pictures/97911969/ef37b496_original.jpg?aki_policy=small
## 3                                                                                              
## 4                                                                                              
## 5 https://a2.muscache.com/im/pictures/df47511b-0e86-4dcb-9887-569489b16020.jpg?aki_policy=small
## 6                                                                                              
##                                                                                       medium_url
## 1                                                                                               
## 2           https://a1.muscache.com/im/pictures/97911969/ef37b496_original.jpg?aki_policy=medium
## 3                                                                                               
## 4                                                                                               
## 5 https://a2.muscache.com/im/pictures/df47511b-0e86-4dcb-9887-569489b16020.jpg?aki_policy=medium
## 6                                                                                               
##                                                                                     picture_url
## 1           https://a1.muscache.com/im/pictures/61090424/02c8a8bb_original.jpg?aki_policy=large
## 2           https://a1.muscache.com/im/pictures/97911969/ef37b496_original.jpg?aki_policy=large
## 3           https://a1.muscache.com/im/pictures/96821426/ea9864f1_original.jpg?aki_policy=large
## 4 https://a2.muscache.com/im/pictures/5fa65f2d-b159-4fb5-986a-bd36cb92d2bc.jpg?aki_policy=large
## 5 https://a2.muscache.com/im/pictures/df47511b-0e86-4dcb-9887-569489b16020.jpg?aki_policy=large
## 6           https://a2.muscache.com/im/pictures/69589240/79d976c4_original.jpg?aki_policy=large
##                                                                                    xl_picture_url
## 1                                                                                                
## 2           https://a1.muscache.com/im/pictures/97911969/ef37b496_original.jpg?aki_policy=x_large
## 3                                                                                                
## 4                                                                                                
## 5 https://a2.muscache.com/im/pictures/df47511b-0e86-4dcb-9887-569489b16020.jpg?aki_policy=x_large
## 6                                                                                                
##    host_id                                   host_url host_name host_since
## 1  9703910  https://www.airbnb.com/users/show/9703910  Matthieu 2013-10-29
## 2 35777602 https://www.airbnb.com/users/show/35777602    Claire 2015-06-14
## 3 13945253 https://www.airbnb.com/users/show/13945253   Vincent 2014-04-06
## 4  5107123  https://www.airbnb.com/users/show/5107123     Julie 2013-02-16
## 5 51195601 https://www.airbnb.com/users/show/51195601   Daniele 2015-12-13
## 6 28980052 https://www.airbnb.com/users/show/28980052  Philippe 2015-03-08
##                      host_location
## 1 Nantes, Pays de la Loire, France
## 2    Paris, ÃŽle-de-France, France
## 3    Paris, ÃŽle-de-France, France
## 4    Paris, ÃŽle-de-France, France
## 5            Prato, Toscana, Italy
## 6    Paris, ÃŽle-de-France, France
##                                                                  host_about
## 1                                                                          
## 2                                                                          
## 3                                                                          
## 4 Nous sommes un jeune couple vivant à Paris. Nous aimons beaucoup voyager
## 5                                                                          
## 6                                                                          
##   host_response_time host_response_rate host_acceptance_rate host_is_superhost
## 1                N/A                N/A                  N/A                 f
## 2                N/A                N/A                  N/A                 f
## 3     within an hour               100%                  N/A                 f
## 4       within a day                50%                  N/A                 f
## 5     within an hour               100%                  60%                 f
## 6                N/A                N/A                  N/A                 f
##                                                                                       host_thumbnail_url
## 1  https://a0.muscache.com/im/users/9703910/profile_pic/1383073563/original.jpg?aki_policy=profile_small
## 2 https://a1.muscache.com/im/users/35777602/profile_pic/1438688930/original.jpg?aki_policy=profile_small
## 3 https://a0.muscache.com/im/users/13945253/profile_pic/1396781528/original.jpg?aki_policy=profile_small
## 4  https://a1.muscache.com/im/users/5107123/profile_pic/1425849895/original.jpg?aki_policy=profile_small
## 5  https://a2.muscache.com/im/pictures/e984ba68-7571-46d9-99dc-735ec6e5c9d6.jpg?aki_policy=profile_small
## 6 https://a0.muscache.com/im/users/28980052/profile_pic/1425844331/original.jpg?aki_policy=profile_small
##                                                                                            host_picture_url
## 1  https://a0.muscache.com/im/users/9703910/profile_pic/1383073563/original.jpg?aki_policy=profile_x_medium
## 2 https://a1.muscache.com/im/users/35777602/profile_pic/1438688930/original.jpg?aki_policy=profile_x_medium
## 3 https://a0.muscache.com/im/users/13945253/profile_pic/1396781528/original.jpg?aki_policy=profile_x_medium
## 4  https://a1.muscache.com/im/users/5107123/profile_pic/1425849895/original.jpg?aki_policy=profile_x_medium
## 5  https://a2.muscache.com/im/pictures/e984ba68-7571-46d9-99dc-735ec6e5c9d6.jpg?aki_policy=profile_x_medium
## 6 https://a0.muscache.com/im/users/28980052/profile_pic/1425844331/original.jpg?aki_policy=profile_x_medium
##   host_neighbourhood host_listings_count host_total_listings_count
## 1        Batignolles                   1                         1
## 2    Champs-Elysées                   1                         1
## 3        Batignolles                   1                         1
## 4        Batignolles                   1                         1
## 5             Ternes                   1                         1
## 6        Batignolles                   1                         1
##                       host_verifications host_has_profile_pic
## 1          ['email', 'phone', 'reviews']                    t
## 2          ['email', 'phone', 'reviews']                    t
## 3          ['email', 'phone', 'reviews']                    t
## 4 ['email', 'phone', 'reviews', 'jumio']                    t
## 5 ['email', 'phone', 'reviews', 'jumio']                    t
## 6                     ['email', 'phone']                    t
##   host_identity_verified
## 1                      f
## 2                      f
## 3                      f
## 4                      t
## 5                      t
## 6                      f
##                                                    street   neighbourhood
## 1       Rue Legendre, Paris, ÃŽle-de-France 75017, France     Batignolles
## 2   Avenue Mac-Mahon, Paris, Île-de-France 75017, France Champs-Elysées
## 3   Rue la Condamine, Paris, ÃŽle-de-France 75017, France     Batignolles
## 4        Rue Gauthey, Paris, ÃŽle-de-France 75017, France     Batignolles
## 5 Avenue Brunetière, Paris, Île-de-France 75017, France          Ternes
## 6    Rue de Saussure, Paris, ÃŽle-de-France 75017, France     Batignolles
##   neighbourhood_cleansed neighbourhood_group_cleansed  city          state
## 1    Batignolles-Monceau                           NA Paris ÃŽle-de-France
## 2    Batignolles-Monceau                           NA Paris ÃŽle-de-France
## 3    Batignolles-Monceau                           NA Paris ÃŽle-de-France
## 4    Batignolles-Monceau                           NA Paris ÃŽle-de-France
## 5    Batignolles-Monceau                           NA Paris ÃŽle-de-France
## 6    Batignolles-Monceau                           NA Paris ÃŽle-de-France
##   zipcode market smart_location country_code country latitude longitude
## 1   75017  Paris  Paris, France           FR  France 48.88880  2.320466
## 2   75017  Paris  Paris, France           FR  France 48.87664  2.293724
## 3   75017  Paris  Paris, France           FR  France 48.88384  2.321031
## 4   75017  Paris  Paris, France           FR  France 48.89236  2.322338
## 5   75017  Paris  Paris, France           FR  France 48.88942  2.298321
## 6   75017  Paris  Paris, France           FR  France 48.88707  2.312212
##   is_location_exact property_type       room_type accommodates bathrooms
## 1                 t     Apartment Entire home/apt            2         1
## 2                 t     Apartment Entire home/apt            4         2
## 3                 t     Apartment Entire home/apt            2         1
## 4                 t     Apartment Entire home/apt            2         1
## 5                 t     Apartment    Private room            2         1
## 6                 t         House Entire home/apt            6         3
##   bedrooms beds bed_type
## 1        1    1 Real Bed
## 2        2    3 Real Bed
## 3        1    1 Real Bed
## 4        1    1 Real Bed
## 5        1    1 Real Bed
## 6        4    4 Real Bed
##                                                                                                                                                       amenities
## 1                                                                          {TV,"Cable TV",Internet,"Wireless Internet",Kitchen,Heating,Washer,Dryer,Essentials}
## 2                                                       {"Wireless Internet",Kitchen,"Elevator in Building","Buzzer/Wireless Intercom",Washer,Dryer,Essentials}
## 3                                          {TV,Internet,"Wireless Internet",Kitchen,"Indoor Fireplace",Heating,"Family/Kid Friendly",Washer,Essentials,Shampoo}
## 4                                                                                                       {"Wireless Internet",Kitchen,Heating,Washer,Essentials}
## 5 {"Wireless Internet",Kitchen,"Smoking Allowed","Pets Allowed",Breakfast,"Elevator in Building",Heating,"Family/Kid Friendly",Washer,Dryer,Essentials,Shampoo}
## 6                          {TV,Internet,"Wireless Internet",Kitchen,Heating,"Family/Kid Friendly",Washer,Dryer,"Smoke Detector","Fire Extinguisher",Essentials}
##   square_feet   price weekly_price monthly_price security_deposit cleaning_fee
## 1          NA  $60.00      $388.00                        $200.00       $20.00
## 2          NA $200.00                                                         
## 3          NA  $80.00      $501.00     $1,503.00          $501.00             
## 4          NA  $60.00                                     $250.00             
## 5          NA  $50.00                                                         
## 6          NA $191.00                                                   $50.00
##   guests_included extra_people minimum_nights maximum_nights calendar_updated
## 1               1        $0.00              1           1125     5 months ago
## 2               1        $0.00              1           1125    11 months ago
## 3               1        $0.00              3           1125            today
## 4               0        $0.00              2           1125     8 months ago
## 5               1        $0.00              1             30      4 weeks ago
## 6               1        $0.00              3           1125     5 months ago
##   has_availability availability_30 availability_60 availability_90
## 1               NA               0               0               0
## 2               NA               0               0               0
## 3               NA               6              23              23
## 4               NA              29              59              89
## 5               NA              29              59              89
## 6               NA               0               0               0
##   availability_365 calendar_last_scraped number_of_reviews first_review
## 1                0            2016-07-03                 1   2015-05-19
## 2                0            2016-07-04                 0             
## 3              298            2016-07-04                 1   2015-10-10
## 4              364            2016-07-03                 1   2015-12-15
## 5               89            2016-07-04                 2   2016-06-17
## 6                0            2016-07-04                 0             
##   last_review review_scores_rating review_scores_accuracy
## 1  2015-05-19                  100                     10
## 2                               NA                     NA
## 3  2015-10-10                   80                     NA
## 4  2015-12-15                   80                      6
## 5  2016-06-17                  100                     10
## 6                               NA                     NA
##   review_scores_cleanliness review_scores_checkin review_scores_communication
## 1                        10                    10                          10
## 2                        NA                    NA                          NA
## 3                        NA                    NA                          NA
## 4                        10                     8                          10
## 5                        10                    10                          10
## 6                        NA                    NA                          NA
##   review_scores_location review_scores_value requires_license license
## 1                     10                  10                f        
## 2                     NA                  NA                f        
## 3                     NA                  NA                f        
## 4                      6                   8                f        
## 5                     10                  10                f        
## 6                     NA                  NA                f        
##   jurisdiction_names instant_bookable cancellation_policy
## 1              Paris                f            flexible
## 2              Paris                f            flexible
## 3              Paris                f            flexible
## 4              Paris                f            flexible
## 5              Paris                f            flexible
## 6              Paris                f            flexible
##   require_guest_profile_picture require_guest_phone_verification
## 1                             f                                f
## 2                             f                                f
## 3                             f                                f
## 4                             f                                f
## 5                             f                                f
## 6                             f                                f
##   calculated_host_listings_count reviews_per_month
## 1                              1              0.07
## 2                              1                NA
## 3                              1              0.11
## 4                              1              0.15
## 5                              1              2.00
## 6                              1                NA
head(R)
##   listing_id       date
## 1   12007141 2016-04-16
## 2   12007141 2016-04-26
## 3   12007141 2016-05-03
## 4   12007141 2016-06-15
## 5    6666099 2015-06-21
## 6    6666099 2015-07-27

We observe the following:

  • The list L contains 95 variables of different types.

  • The list R contains only two variables.

The L list will be used to analyse the features and the R list will be used to compute the visit frequency of the different quarters according to time.

Using the select clause, a subset of the L dataset is created to use only the variables (out of the 95) that will be useful for the project:

data <- select(L, listing_id = id, host_id, host_name, bathrooms, bedrooms, 
               beds, bed_type, equipments= amenities, type= property_type, room= room_type, 
               nb_of_guests= accommodates, price, guests_included, minimum_nights, 
               maximum_nights,availability_over_one_year= availability_365, instant_bookable, 
               cancellation_policy, city, address= street, neighbourhood=neighbourhood_cleansed, 
               city_quarter=zipcode, latitude, longitude, security_deposit, transit, 
               host_response_time, superhost= host_is_superhost, host_since, 
               listing_count= calculated_host_listings_count, host_score= review_scores_rating, 
               reviews_per_month, number_of_reviews)
head(data)
##   listing_id  host_id host_name bathrooms bedrooms beds bed_type
## 1    4867396  9703910  Matthieu         1        1    1 Real Bed
## 2    7704653 35777602    Claire         2        2    3 Real Bed
## 3    2725029 13945253   Vincent         1        1    1 Real Bed
## 4    9337509  5107123     Julie         1        1    1 Real Bed
## 5   12928158 51195601   Daniele         1        1    1 Real Bed
## 6    5589471 28980052  Philippe         3        4    4 Real Bed
##                                                                                                                                                      equipments
## 1                                                                          {TV,"Cable TV",Internet,"Wireless Internet",Kitchen,Heating,Washer,Dryer,Essentials}
## 2                                                       {"Wireless Internet",Kitchen,"Elevator in Building","Buzzer/Wireless Intercom",Washer,Dryer,Essentials}
## 3                                          {TV,Internet,"Wireless Internet",Kitchen,"Indoor Fireplace",Heating,"Family/Kid Friendly",Washer,Essentials,Shampoo}
## 4                                                                                                       {"Wireless Internet",Kitchen,Heating,Washer,Essentials}
## 5 {"Wireless Internet",Kitchen,"Smoking Allowed","Pets Allowed",Breakfast,"Elevator in Building",Heating,"Family/Kid Friendly",Washer,Dryer,Essentials,Shampoo}
## 6                          {TV,Internet,"Wireless Internet",Kitchen,Heating,"Family/Kid Friendly",Washer,Dryer,"Smoke Detector","Fire Extinguisher",Essentials}
##        type            room nb_of_guests   price guests_included minimum_nights
## 1 Apartment Entire home/apt            2  $60.00               1              1
## 2 Apartment Entire home/apt            4 $200.00               1              1
## 3 Apartment Entire home/apt            2  $80.00               1              3
## 4 Apartment Entire home/apt            2  $60.00               0              2
## 5 Apartment    Private room            2  $50.00               1              1
## 6     House Entire home/apt            6 $191.00               1              3
##   maximum_nights availability_over_one_year instant_bookable
## 1           1125                          0                f
## 2           1125                          0                f
## 3           1125                        298                f
## 4           1125                        364                f
## 5             30                         89                f
## 6           1125                          0                f
##   cancellation_policy  city
## 1            flexible Paris
## 2            flexible Paris
## 3            flexible Paris
## 4            flexible Paris
## 5            flexible Paris
## 6            flexible Paris
##                                                   address       neighbourhood
## 1       Rue Legendre, Paris, ÃŽle-de-France 75017, France Batignolles-Monceau
## 2   Avenue Mac-Mahon, Paris, ÃŽle-de-France 75017, France Batignolles-Monceau
## 3   Rue la Condamine, Paris, ÃŽle-de-France 75017, France Batignolles-Monceau
## 4        Rue Gauthey, Paris, ÃŽle-de-France 75017, France Batignolles-Monceau
## 5 Avenue Brunetière, Paris, Île-de-France 75017, France Batignolles-Monceau
## 6    Rue de Saussure, Paris, ÃŽle-de-France 75017, France Batignolles-Monceau
##   city_quarter latitude longitude security_deposit transit host_response_time
## 1        75017 48.88880  2.320466          $200.00                        N/A
## 2        75017 48.87664  2.293724                                         N/A
## 3        75017 48.88384  2.321031          $501.00             within an hour
## 4        75017 48.89236  2.322338          $250.00               within a day
## 5        75017 48.88942  2.298321                              within an hour
## 6        75017 48.88707  2.312212                                         N/A
##   superhost host_since listing_count host_score reviews_per_month
## 1         f 2013-10-29             1        100              0.07
## 2         f 2015-06-14             1         NA                NA
## 3         f 2014-04-06             1         80              0.11
## 4         f 2013-02-16             1         80              0.15
## 5         f 2015-12-13             1        100              2.00
## 6         f 2015-03-08             1         NA                NA
##   number_of_reviews
## 1                 1
## 2                 0
## 3                 1
## 4                 1
## 5                 2
## 6                 0

As part of the cleaning of the dataset, duplicate data needs to be removed:

data %>% distinct(listing_id, .keep_all = TRUE)

Also, the $ sign in the prices will give us problem when manipulating the numbers so it needs to be removed as well:

data$price <- substring(gsub(",", "", as.character(data$price)),2)

Finally, we need to ensure that the the variables have the appropriate data type:

Converting numeric columns:

data$bathrooms <- as.numeric((data$bathrooms))
data$bedrooms <- as.numeric((data$bedrooms))
data$beds <- as.numeric((data$beds))
data$price <- as.numeric((data$price))
data$guests_included <- as.numeric((data$guests_included))
data$minimum_nights <- as.numeric((data$minimum_nights))
data$maximum_nights <- as.numeric((data$maximum_nights))
data$availability_over_one_year <- as.numeric((data$availability_over_one_year))
data$security_deposit <- as.numeric((data$security_deposit))
data$listing_count <- as.numeric((data$listing_count))
data$host_score <- as.numeric((data$host_score))
data$reviews_per_month <- as.numeric((data$reviews_per_month))
data$number_of_reviews <- as.numeric((data$number_of_reviews))

Converting character columns:

data$neighbourhood <- as.character(data$neighbourhood)

Some neighborhood names have encoding issues, we can rewrite them correctly:

data[data == "Panthéon"] <- "Panthéon"
data[data == "Opéra"] <- "Opéra"
data[data == "Entrepôt"] <- "Entrepôt"
data[data == "Élysée"] <- "Elysée"
data[data == "Ménilmontant"] <- "Mesnilmontant"
data[data == "Hôtel-de-Ville"] <- "Hôtel-de-Ville"

Notice that there are missing values for some columns. The approach followed in this case is to fill the missing values with the mean value of the corresponding column (bathrooms, bedrooms and beds):

temp = mean(data$bathrooms, na.rm = TRUE) 
val = is.na(data$bathrooms) 
data$bathrooms[val] = temp
temp = mean(data$bedrooms, na.rm = TRUE)
val = is.na(data$bedrooms)
data$bedrooms[val] = temp
temp = mean(data$beds, na.rm = TRUE) 
val = is.na(data$beds)
data$beds[val] = temp

The data is now cleaned, let’s have a look at the first rows of our new dataset:

head(data)
##   listing_id  host_id host_name bathrooms bedrooms beds bed_type
## 1    4867396  9703910  Matthieu         1        1    1 Real Bed
## 2    7704653 35777602    Claire         2        2    3 Real Bed
## 3    2725029 13945253   Vincent         1        1    1 Real Bed
## 4    9337509  5107123     Julie         1        1    1 Real Bed
## 5   12928158 51195601   Daniele         1        1    1 Real Bed
## 6    5589471 28980052  Philippe         3        4    4 Real Bed
##                                                                                                                                                      equipments
## 1                                                                          {TV,"Cable TV",Internet,"Wireless Internet",Kitchen,Heating,Washer,Dryer,Essentials}
## 2                                                       {"Wireless Internet",Kitchen,"Elevator in Building","Buzzer/Wireless Intercom",Washer,Dryer,Essentials}
## 3                                          {TV,Internet,"Wireless Internet",Kitchen,"Indoor Fireplace",Heating,"Family/Kid Friendly",Washer,Essentials,Shampoo}
## 4                                                                                                       {"Wireless Internet",Kitchen,Heating,Washer,Essentials}
## 5 {"Wireless Internet",Kitchen,"Smoking Allowed","Pets Allowed",Breakfast,"Elevator in Building",Heating,"Family/Kid Friendly",Washer,Dryer,Essentials,Shampoo}
## 6                          {TV,Internet,"Wireless Internet",Kitchen,Heating,"Family/Kid Friendly",Washer,Dryer,"Smoke Detector","Fire Extinguisher",Essentials}
##        type            room nb_of_guests price guests_included minimum_nights
## 1 Apartment Entire home/apt            2    60               1              1
## 2 Apartment Entire home/apt            4   200               1              1
## 3 Apartment Entire home/apt            2    80               1              3
## 4 Apartment Entire home/apt            2    60               0              2
## 5 Apartment    Private room            2    50               1              1
## 6     House Entire home/apt            6   191               1              3
##   maximum_nights availability_over_one_year instant_bookable
## 1           1125                          0                f
## 2           1125                          0                f
## 3           1125                        298                f
## 4           1125                        364                f
## 5             30                         89                f
## 6           1125                          0                f
##   cancellation_policy  city
## 1            flexible Paris
## 2            flexible Paris
## 3            flexible Paris
## 4            flexible Paris
## 5            flexible Paris
## 6            flexible Paris
##                                                   address       neighbourhood
## 1       Rue Legendre, Paris, ÃŽle-de-France 75017, France Batignolles-Monceau
## 2   Avenue Mac-Mahon, Paris, ÃŽle-de-France 75017, France Batignolles-Monceau
## 3   Rue la Condamine, Paris, ÃŽle-de-France 75017, France Batignolles-Monceau
## 4        Rue Gauthey, Paris, ÃŽle-de-France 75017, France Batignolles-Monceau
## 5 Avenue Brunetière, Paris, Île-de-France 75017, France Batignolles-Monceau
## 6    Rue de Saussure, Paris, ÃŽle-de-France 75017, France Batignolles-Monceau
##   city_quarter latitude longitude security_deposit transit host_response_time
## 1        75017 48.88880  2.320466               94                        N/A
## 2        75017 48.87664  2.293724                1                        N/A
## 3        75017 48.88384  2.321031              208             within an hour
## 4        75017 48.89236  2.322338              106               within a day
## 5        75017 48.88942  2.298321                1             within an hour
## 6        75017 48.88707  2.312212                1                        N/A
##   superhost host_since listing_count host_score reviews_per_month
## 1         f 2013-10-29             1        100              0.07
## 2         f 2015-06-14             1         NA                NA
## 3         f 2014-04-06             1         80              0.11
## 4         f 2013-02-16             1         80              0.15
## 5         f 2015-12-13             1        100              2.00
## 6         f 2015-03-08             1         NA                NA
##   number_of_reviews
## 1                 1
## 2                 0
## 3                 1
## 4                 1
## 5                 2
## 6                 0

And also the summary:

summary(data)
##    listing_id          host_id            host_name       bathrooms   
##  Min.   :    2623   Min.   :    2626   Marie   :  583   Min.   :0.00  
##  1st Qu.: 3470301   1st Qu.: 6158190   Nicolas :  436   1st Qu.:1.00  
##  Median : 6965852   Median :15885410   Pierre  :  418   Median :1.00  
##  Mean   : 7069608   Mean   :22485601   Caroline:  388   Mean   :1.09  
##  3rd Qu.:10740059   3rd Qu.:34348717   Anne    :  387   3rd Qu.:1.00  
##  Max.   :13819560   Max.   :81397049   Sophie  :  372   Max.   :8.00  
##                                        (Other) :50141                 
##     bedrooms           beds                 bed_type    
##  Min.   : 0.000   Min.   : 0.000   Airbed       :   35  
##  1st Qu.: 1.000   1st Qu.: 1.000   Couch        : 1182  
##  Median : 1.000   Median : 1.000   Futon        :  449  
##  Mean   : 1.059   Mean   : 1.684   Pull-out Sofa: 5066  
##  3rd Qu.: 1.000   3rd Qu.: 2.000   Real Bed     :45993  
##  Max.   :10.000   Max.   :16.000                        
##                                                         
##                                                                           equipments   
##  {}                                                                            :  552  
##  {TV,Internet,"Wireless Internet",Kitchen,Heating,Washer,Essentials}           :   95  
##  {Internet,"Wireless Internet",Kitchen,Heating,Washer,Essentials}              :   90  
##  {Internet,"Wireless Internet",Kitchen,Heating,Essentials}                     :   68  
##  {TV,"Cable TV",Internet,"Wireless Internet",Kitchen,Heating,Washer,Essentials}:   64  
##  {TV,"Cable TV",Internet,"Wireless Internet",Kitchen,Heating,Washer}           :   64  
##  (Other)                                                                       :51792  
##               type                    room        nb_of_guests   
##  Apartment      :50663   Entire home/apt:45177   Min.   : 1.000  
##  Loft           :  567   Private room   : 7001   1st Qu.: 2.000  
##  House          :  537   Shared room    :  547   Median : 2.000  
##  Bed & Breakfast:  394                           Mean   : 3.051  
##  Condominium    :  266                           3rd Qu.: 4.000  
##  Other          :  122                           Max.   :16.000  
##  (Other)        :  176                                           
##      price         guests_included  minimum_nights     maximum_nights     
##  Min.   :   0.00   Min.   : 0.000   Min.   :   1.000   Min.   :1.000e+00  
##  1st Qu.:  55.00   1st Qu.: 1.000   1st Qu.:   1.000   1st Qu.:6.000e+01  
##  Median :  75.00   Median : 1.000   Median :   2.000   Median :1.125e+03  
##  Mean   :  96.51   Mean   : 1.353   Mean   :   3.128   Mean   :1.253e+05  
##  3rd Qu.: 110.00   3rd Qu.: 2.000   3rd Qu.:   3.000   3rd Qu.:1.125e+03  
##  Max.   :6081.00   Max.   :16.000   Max.   :1000.000   Max.   :2.147e+09  
##                                                                           
##  availability_over_one_year instant_bookable      cancellation_policy
##  Min.   :  0.0              f:44186          flexible       :19244   
##  1st Qu.: 22.0              t: 8539          moderate       :15039   
##  Median :183.0                               strict         :18427   
##  Mean   :179.5                               super_strict_30:    6   
##  3rd Qu.:336.0                               super_strict_60:    9   
##  Max.   :365.0                                                       
##                                                                      
##                        city      
##  Paris                   :50825  
##  Paris-15E-Arrondissement:  115  
##  Paris-19E-Arrondissement:  106  
##  Paris-20E-Arrondissement:   87  
##  Paris-18E-Arrondissement:   77  
##  Paris-16E-Arrondissement:   76  
##  (Other)                 : 1439  
##                                                               address     
##  Paris, ÃŽle-de-France, France                                    :  308  
##  Boulevard Voltaire, Paris, ÃŽle-de-France 75011, France          :  209  
##  Rue du Faubourg Saint-Martin, Paris, ÃŽle-de-France 75010, France:  202  
##  Rue Oberkampf, Paris, ÃŽle-de-France 75011, France               :  202  
##  Rue Saint-Maur, Paris, ÃŽle-de-France 75011, France              :  196  
##  Rue de Charenton, Paris, ÃŽle-de-France 75012, France            :  188  
##  (Other)                                                          :51420  
##  neighbourhood       city_quarter      latitude       longitude    
##  Length:52725       75018  : 5973   Min.   :48.81   Min.   :2.221  
##  Class :character   75011  : 4825   1st Qu.:48.85   1st Qu.:2.323  
##  Mode  :character   75015  : 3799   Median :48.86   Median :2.347  
##                     75010  : 3511   Mean   :48.86   Mean   :2.344  
##                     75017  : 3465   3rd Qu.:48.88   3rd Qu.:2.369  
##                     75020  : 2859   Max.   :48.91   Max.   :2.475  
##                     (Other):28293                                  
##  security_deposit
##  Min.   :  1.00  
##  1st Qu.:  1.00  
##  Median : 58.00  
##  Mean   : 81.57  
##  3rd Qu.:129.00  
##  Max.   :304.00  
##                  
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 transit     
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     :18546  
##  Public transportation is a bit of a maze in Paris. I recommend you to book a transfer on the app Bonjour Paris (G00gle or Apple store).                                                                                                                                                                                                                                                                                                                                                            :   16  
##  DIRECT ACCESS From Airport CDG (Charles de Gaule-Roissy)  DIRECT ACCESS From Airport  ORLY EASY & FAST ACCESS from TRAIN STATIONS METRO Station Saint Michel line 4 is 3 minutes by foot from my place RER Station  Saint Michel line B is 3 minutes by foot from my place TAXI STATION is 3 minutes by foot from my place By CAR : 2 choices of PARKING both 5 minutes by foot from my place : â\200œParking Saint Michelâ\200\235 Rue Francisque Gay n°46 and â\200œParking Notre Dameâ\200\235 Place Jean Paul II:   12  
##  Subway: Châtelet (lines 1, 4, 7, 11 & 14, RER A, B & D)                                                                                                                                                                                                                                                                                                                                                                                                                                           :   12  
##  Odéon station line 4 and 10 Saint Michel station line 4, RER B and RER C                                                                                                                                                                                                                                                                                                                                                                                                                          :   10  
##  (Other)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            :34128  
##  NA's                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               :    1  
##           host_response_time superhost      host_since    listing_count    
##                    :   46     :   46   2012-05-04:  166   Min.   :  1.000  
##  a few days or more:  996    f:50513   2012-06-18:  165   1st Qu.:  1.000  
##  N/A               :12517    t: 2166   2012-10-25:  155   Median :  1.000  
##  within a day      :10201              2014-03-10:  135   Mean   :  4.087  
##  within a few hours:13926              2015-07-29:  128   3rd Qu.:  1.000  
##  within an hour    :15039              2013-07-20:  116   Max.   :155.000  
##                                        (Other)   :51860                    
##    host_score     reviews_per_month number_of_reviews
##  Min.   : 20.00   Min.   : 0.010    Min.   :  0.00   
##  1st Qu.: 87.00   1st Qu.: 0.360    1st Qu.:  0.00   
##  Median : 93.00   Median : 0.900    Median :  3.00   
##  Mean   : 91.01   Mean   : 1.336    Mean   : 12.59   
##  3rd Qu.: 97.00   3rd Qu.: 1.870    3rd Qu.: 13.00   
##  Max.   :100.00   Max.   :14.290    Max.   :392.00   
##  NA's   :15454    NA's   :14508

Analysis

Relationship between prices and apartment features:

  • Price:
summary(data$price)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   55.00   75.00   96.51  110.00 6081.00
p1<- ggplot(data) + 
  geom_histogram(aes(price), fill = "#971a4a", alpha = 0.85, binwidth = 15) + 
  theme_minimal(base_size = 13) + 
  xlab("Price") + 
  ylab("Frequency") + 
  ggtitle("Distribution of Price")

p2 <- ggplot(data, aes(price)) +
  geom_histogram(bins = 30, aes(y = ..density..), fill = "#971a4a") + 
  geom_density(alpha = 0.2, fill = "#971a4a") + 
  ggtitle("Logarithmic distribution of Price", subtitle = expression("With" ~'log'[10] ~ "transformation of x-axis")) + 
  scale_x_log10()


ggarrange(p1,
          p2,
          nrow = 1,
          ncol=2,
          labels = c("1. ", "2. "))
## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.

## Warning: Please use `after_stat(density)` instead.
## Warning: Transformation introduced infinite values in continuous x-axis

## Warning: Transformation introduced infinite values in continuous x-axis
## Warning: Removed 2 rows containing non-finite values (`stat_bin()`).
## Warning: Removed 2 rows containing non-finite values (`stat_density()`).

In the logarithmic distribution of the variable price a better insight view of this variable can be perceived.

  • Property type:
data %>%distinct(type)
##               type
## 1        Apartment
## 2            House
## 3      Condominium
## 4             Loft
## 5            Other
## 6  Bed & Breakfast
## 7                 
## 8             Dorm
## 9        Townhouse
## 10            Boat
## 11           Villa
## 12            Tent
## 13           Cabin
## 14            Tipi
## 15       Camper/RV
## 16            Cave
## 17          Chalet
## 18       Treehouse
## 19     Earth House
## 20           Igloo

Listing types according to the property types:

property_type_count <- table(data$type)
property_types_counts <- table(data$type,exclude=names(property_type_count[property_type_count[] < 4000]))
others <- sum(as.vector(property_type_count[property_type_count[] < 4000]))
property_types_counts['Others'] <- others
property_types <- names(property_types_counts)
counts <- as.vector(property_types_counts)
percentages <- scales::percent(round(counts/sum(counts), 2))
property_types_percentages <- sprintf("%s (%s)", property_types, percentages)
property_types_counts_df <- data.frame(group = property_types, value = counts)
res1 <- ggplot(property_types_counts_df, aes(x="",y=value, fill=property_types_percentages)) +
  geom_bar(width = 1,stat = "identity") +
  coord_polar("y",start = 0) +
  scale_fill_brewer("Property Types",palette = "BuPu")+
  ggtitle("Listings according to property types") +
  theme(plot.title = element_text(color = "Black", size = 12, hjust = 0.5))+
  ylab("") +
  xlab("") +
  theme(axis.ticks = element_blank(), panel.grid = element_blank(), axis.text = element_blank()) +
  geom_text(aes(label = percentages), size= 4, position = position_stack(vjust = 0.5))

res1

96% of the listings are of type apartment.

Distribution of the price for each property type:

ggplot(data) +  
  geom_boxplot(aes(x = type,y = price,fill = type)) +
  labs(x = "Property Type",y = "Price",fill = "Property Type") +  
  coord_flip()

We can see that some property types are more expensive than the average, this property types are: Villa, Townhouse, House and Camper/RV. Since in the dataset the 96% of the listings are of type apartment, less than 4% lays in those property types.

  • Room type:
data %>%distinct(room)
##              room
## 1 Entire home/apt
## 2    Private room
## 3     Shared room

Listing types according to the room type:

room_types_counts <- table(data$room)
room_types <- names(room_types_counts)
counts <- as.vector(room_types_counts)
percentages <- scales::percent(round(counts/sum(counts), 2))
room_types_percentages <- sprintf("%s (%s)", room_types, percentages)
room_types_counts_df <- data.frame(group = room_types, value = counts)

res2 <- ggplot(room_types_counts_df, aes(x = "", y = value, fill = room_types_percentages)) +
  geom_bar(width = 1, stat = "identity") +
  coord_polar("y", start = 0) +
  scale_fill_brewer("Room Types", palette = "BuPu") +
  ggtitle("Listing types according to Room types") +
  theme(plot.title = element_text(color = "black", size = 12, hjust = 0.5)) +
  ylab("") +
  xlab("") +
  labs(fill="") +
  theme(axis.ticks = element_blank(), panel.grid = element_blank(), axis.text = element_blank()) +
  geom_text(aes(label = percentages), size = 5, position = position_stack(vjust = 0.5))

res2

There exists three types of rooms: Entire home/apt, Private room and Shared room. Among those, 86% of the listings are entire apartments.

Price by room type:

ggplot(data)+ 
  geom_boxplot(aes(x = room,y = price, fill = room)) + 
  labs(x = "Room Type", y = "Price", fill = "Room Type")+ 
  coord_flip()

The price increases in this order: shared room > private room > entire home/apt. Let’s have a look at the average price by room type:

data %>% 
     group_by(room) %>% 
     summarise(mean_price = mean(price, na.rm = TRUE)) %>% 
     ggplot(aes(x = reorder(room, mean_price), y = mean_price, fill = room)) +
     geom_col(stat ="identity", fill="#971a4a") +
     coord_flip() +
     theme_minimal() +
     labs(x = "Room Type", y = "Price") +
     geom_text(aes(label = round(mean_price,digit = 2)), hjust = 1.0, color = "white", size = 4.5) +
     ggtitle("Mean Price / Room Types") + 
     xlab("Room Type") + 
     ylab("Mean Price")
## `summarise()` ungrouping output (override with `.groups` argument)
## Warning in geom_col(stat = "identity", fill = "#971a4a"): Ignoring unknown
## parameters: `stat`

  • Cancellation policy / Host response time:
price_cancellation_policy <- ggplot(data = data, 
  aes(x = cancellation_policy, y = price, color=cancellation_policy)) +
  geom_boxplot(outlier.shape = NA) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  theme(plot.title = element_text(color = "#971a4a", size = 12, face = "bold", hjust = 0.5))+
  coord_cartesian(ylim = c(0, 500))

host_data_without_null_host_response_time <- subset(data, host_response_time != "N/A" & host_response_time != "")

price_response_time <- ggplot(data = host_data_without_null_host_response_time, 
  aes(x = host_response_time, y = price, color = host_response_time)) + 
  geom_boxplot(outlier.shape = NA) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  theme(plot.title = element_text(color = "#971a4a", size = 12, face = "bold", hjust = 0.5)) +
  coord_cartesian(ylim = c(0, 500))

ggarrange(price_response_time,
          price_cancellation_policy,
          nrow = 1,
          ncol = 2,
          labels = c("1. ", "2. "))

We can observe no relation in the first graph between the host response time and the price but, on the second graph we can see that the cancellation policy does have an impact on the price depending on its type it’s more or less expensive.

  • Instant bookable:
ggplot(data = data, aes(x = instant_bookable, y = price, color = instant_bookable)) +
       geom_boxplot(outlier.shape = NA) +coord_cartesian(ylim = c(0, 500))

No clear dependency with this feature.

  • Availability:
ggplot(data, aes(availability_over_one_year, price)) +
  geom_point(alpha = 0.2, color = "#971a4a") +
  geom_density(stat = "identity", alpha = 0.2) +
  xlab("Availability over a year") +
  ylab("Price") +
  ggtitle("Relationship between availability and price") 

No clear dependency with this feature.

Number of apartments per owner:

  • Hosts:
count_by_host_1 <- data %>% 
    group_by(host_id) %>%
    summarise(number_apt_by_host = n()) %>%
    ungroup() %>%
    mutate(groups = case_when(
        number_apt_by_host == 1 ~ "001",
        between(number_apt_by_host, 2, 50) ~ "002-050",
        number_apt_by_host > 50 ~ "051-153"))
## `summarise()` ungrouping output (override with `.groups` argument)
count_by_host_2 <- count_by_host_1 %>%
    group_by(groups) %>%
    summarise(counting = n() %>%
    sort(number_apt_by_host,decreasing = T)) # order by nb of apt per host descending
## Warning in if (is.na(nalast)) noNA <- TRUE else if (nalast) noNA <- !
## is.na(vec[length(vec)]) else noNA <- !is.na(vec[1L]): la condición tiene
## longitud > 1 y sólo el primer elemento será usado
## Warning in if (nalast) noNA <- !is.na(vec[length(vec)]) else noNA <- !
## is.na(vec[1L]): la condición tiene longitud > 1 y sólo el primer elemento será
## usado
## Warning in if (is.na(nalast)) noNA <- TRUE else if (nalast) noNA <- !
## is.na(vec[length(vec)]) else noNA <- !is.na(vec[1L]): la condición tiene
## longitud > 1 y sólo el primer elemento será usado
## Warning in if (nalast) noNA <- !is.na(vec[length(vec)]) else noNA <- !
## is.na(vec[1L]): la condición tiene longitud > 1 y sólo el primer elemento será
## usado
## Warning in if (is.na(nalast)) noNA <- TRUE else if (nalast) noNA <- !
## is.na(vec[length(vec)]) else noNA <- !is.na(vec[1L]): la condición tiene
## longitud > 1 y sólo el primer elemento será usado
## Warning in if (nalast) noNA <- !is.na(vec[length(vec)]) else noNA <- !
## is.na(vec[1L]): la condición tiene longitud > 1 y sólo el primer elemento será
## usado
## `summarise()` ungrouping output (override with `.groups` argument)
num_apt_by_host_id <- (ggplot(count_by_host_2, aes(x = "", y = counting)) +  
              geom_col(aes(fill = factor(groups)), color = "white") + 
              geom_text(aes(y = counting / 1.23, label = counting),color = "black",size = 4)+ 
              labs(x = "", y = "", fill = "Number of apartments per owner") + 
              coord_polar(theta = "y"))+
              theme_minimal()

superhost <- (ggplot(data) + 
                geom_bar(aes(x='' , fill=superhost)) +
                coord_polar(theta='y') +
                scale_fill_brewer(palette="BuPu")) +
                theme_minimal()

ggarrange(num_apt_by_host_id,
          superhost,
          nrow=2,
          ncol=1,
          align = "hv")

Most of the hosts have only one listing (41548 hosts). There is also a minority of superhosts.

Top 20 hosts in Paris:

count_by_host_3 <- data %>%
  group_by(host_id) %>%
  summarise(number_apt_by_host = n()) %>%
  arrange(desc(number_apt_by_host))
## `summarise()` ungrouping output (override with `.groups` argument)
top_listings_by_host <- count_by_host_3 %>%
  top_n(n=20, wt = number_apt_by_host)

knit_print.data.frame <- top_listings_by_host

knit_print.data.frame 
## # A tibble: 22 x 2
##     host_id number_apt_by_host
##       <int>              <int>
##  1  2288803                155
##  2  2667370                139
##  3 12984381                 91
##  4  3972699                 80
##  5  3943828                 65
##  6 21630783                 65
##  7 39922748                 64
##  8   789620                 60
##  9 11593703                 56
## 10  3971743                 55
## # ... with 12 more rows

Renting price per city quarter:

listings_quarter <- ggplot(data, aes(x = fct_infreq(neighbourhood), fill = room)) +
    geom_bar() +
    labs(title = "Nb. Listings per city quarter",
         x = "Neighbourhood", y = "Nb. of listings") +
    theme(legend.position = "bottom",axis.text.x = element_text(angle = 90, hjust = 1), 
          plot.title = element_text(color = "black", size = 12,  hjust = 0.5))

average_prices <- aggregate(cbind(data$price),
                  by = list(arrond = data$city_quarter),
                  FUN = function(x) mean(x))

price <- ggplot(data = average_prices, aes(x = arrond, y = V1)) +
    geom_bar(stat = "identity", fill = "#971a4a", width = 0.7) +
  geom_text(aes(label = round(V1, 2)), size=4) +
    coord_flip() +
    labs(title = "Average daily price per city quarter", 
         x = "City quarters", y = "Average daily price") +
    theme(legend.position = "bottom",axis.text.x = element_text(angle = 90, hjust = 1), 
          plot.title = element_text(color = "black", size = 12,  hjust = 0.5))

ggarrange(listings_quarter,
          price,
          nrow =1,
          ncol = 2,
          labels = c("1. ", "2. "))

Top 10 neighborhoods:

data %>%
  group_by(neighbourhood) %>%
  dplyr::summarize(num_listings = n(), borough = unique(neighbourhood)) %>%
  top_n(n = 10, wt = num_listings) %>%
  ggplot(aes(x = fct_reorder(neighbourhood, num_listings), y = num_listings, fill = borough)) +
  geom_col() +
  coord_flip() +
  labs(title = "Top 10 neighborhoods by nb. of listings", x = "Neighbourhood", y = "Nb. of listings")
## `summarise()` ungrouping output (override with `.groups` argument)

Rented apartments in the past years:

table <- inner_join(data, R, by = "listing_id")
table = mutate(table, year = as.numeric(str_extract(table$date, "^\\d{4}")))
table["date"] <- table["date"] %>% map(., as.Date)

longitudinal  <- table %>%
  group_by(date, neighbourhood) %>%
  summarise(count_obs = n())
## `summarise()` regrouping output by 'date' (override with `.groups` argument)
time_location <- (ggplot(longitudinal, aes(x = date,  y = count_obs, group = 1)) +
                  geom_line(size = 0.5, colour = "lightblue") +
                  stat_smooth(color = "#971a4a", method = "loess") +
                  scale_x_date(date_labels = "%Y") +
                  labs(x = "Year", y = "Nb. Rented Appartment") +
                  facet_wrap(~ neighbourhood))
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.

## Warning: Please use `linewidth` instead.
time_location
## `geom_smooth()` using formula = 'y ~ x'

The most visited and rented locations in Paris are the cheapest ones.

Map representing price range within Paris neighborhoods (higher the closer we are to the center Paris):

height <- max(data$latitude) - min(data$latitude)
width <- max(data$longitude) - min(data$longitude)

paris_limits <- c(bottom = min(data$latitude)  - 0.1 * height, 
                top = max(data$latitude)  + 0.1 * height,
                left = min(data$longitude) - 0.1 * width,
                right = max(data$longitude) + 0.1 * width)

map <- get_stamenmap(paris_limits, zoom = 12)
## i Map tiles by Stamen Design, under CC BY 3.0. Data by OpenStreetMap, under ODbL.
ggmap(map) +
  geom_point(data = data, mapping = aes(x = longitude, y = latitude, col = log(price))) +
  scale_color_distiller(palette = "BuPu", direction = 1)

Visit frequency of the different quartes according to time:

table <- inner_join(data, R,by = "listing_id")
table = mutate(table, year = as.numeric(str_extract(table$date, "^\\d{4}")))
     
res3 <- ggplot(table) +
  geom_bar(aes(y =city_quarter ,fill=factor(year))) +
  scale_size_area() +
  labs( x="Frequency", y="City quarter",fill="Year") +
  scale_fill_brewer(palette ="BuPu")
    
ggplotly(res3)

Map representation

To have a more clear view of the data, it was decided to use Leaflet to display it. This map is interactive and you can move, click and arrange the display as you wish:

Neighborhood listings map:

df <- select(data, longitude, neighbourhood, latitude, price)
leaflet(df %>% select(longitude, neighbourhood, latitude, price))%>%
  setView(lng = 2.3488, lat = 48.8534, zoom = 12) %>%
   addTiles() %>% 
  addMarkers(clusterOptions = markerClusterOptions()) %>%
  addMiniMap()
## Assuming "longitude" and "latitude" are longitude and latitude, respectively

Superhost listings map:

dfsuperhost <- select(data, longitude, neighbourhood, latitude, price)
dfsuperhost <- filter(data, superhost =="t")
leaflet(dfsuperhost %>% select(longitude, neighbourhood, latitude, price))%>%
  setView(lng = 2.3488, lat = 48.8534 ,zoom = 12) %>%
   addTiles() %>% 
  addMarkers(clusterOptions = markerClusterOptions()) %>%
  addMiniMap()
## Assuming "longitude" and "latitude" are longitude and latitude, respectively

Conclusion

After the analysis of the AirBnB dataset, one can conclude that the majority of the listing are of type entire home/apartment, which is also the most expensive one in comparison to the other room types. The prices depends on the different features of the listing like the cancellation policy, neighborhood located.

Most of the hosts have only one listing but some of them have several, the host with the highest number of listings has 154.

The closer the apartment is to the center of Paris, the more expensive it is. The neighborhood in Paris with the highest number of listings is Butter-Montmartre with 5952 listings, which is also the neighborhood with the highest number of rented apartments in the past years.

People visit more entire home/apartment types of listings, especially in the Butter-Montmartre neighborhood since there are more listings.

Finally, there is a minority of superhosts in comparison to hosts. This is probably because a superhost needs to be more active in the platform and have several clients in a year as well as receive positive feedback from the clients to be evaluated as superhost by AirBnB.